Reconstructing Ancient Literary Texts from Noisy Manuscripts

نویسندگان

  • Moshe Koppel
  • Moty Michaely
  • Alex Tal
چکیده

Given multiple corrupted versions of the same text, as is common with ancient manuscripts, we wish to reconstruct the original text from which the extant corrupted versions were copied (typically via latent intermediary versions). This is a challenge of cardinal importance in the humanities. We use a variant of expectation-maximization (EM), to solve this problem. We prove the efficacy of our method on both synthetic and real-world data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Computational Model of Text Reuse in Ancient Literary Texts

We propose a computational model of text reuse tailored for ancient literary texts, available to us often only in small and noisy samples. The model takes into account source alternation patterns, so as to be able to align even sentences with low surface similarity. We demonstrate its ability to characterize text reuse in the Greek New Testament.

متن کامل

Reconstructing the Horizon Expectation of the Decanters in Safavid Era Age

Surāhī (decanter) can be regarded as one of the most common types of drinking vessels in Persian art as well as one of the containers mentioned most in Persian literature (especially in mystical literature). In this study, in order to complete this information in the Safavid era, in both domains, the method of reconstructing the horizons of Hans Robert Jauss was employed. Explaining the theoret...

متن کامل

Literary Figures in Gāthic Texts

Introduction        Gāthic texts are a collection of religious songs of Zarothustra who lived about 1200 BC. Of the seventy two hāts (stanzas) of Yasna (one of the five chapters of Avesta), seventeen hāts belong to five Gāthas. These seventeen hāts have been classified into five categories based on their syllabic meter and the number of the song: 1) ahunavaiti, 2) ushtavaiti, 3)spanta.mainyu, ...

متن کامل

Computational Methods for Coptic: Developing and Using Part-of-Speech Tagging for Digital Scholarship in the Humanities

This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendent of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evalu...

متن کامل

Computational Methods for Coptic

This paper motivates and details the first implementation of a freely available part of speech tag set and tagger for Coptic. Coptic is the last phase of the Egyptian language family and a descendant of the hieroglyphs of ancient Egypt. Unlike classical Greek and Latin, few resources for digital and computational work have existed for ancient Egyptian language and literature until now. We evalu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016